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(54) An interactive speech recognition device 

(57) The present invention relates to an interactive 
speech recognition device that recognises speech and 
produces sounds or actions in response to the recogni- 
tion result. 

The device includes a microphone (1), a speech 
analysis area (2), a recognition area (3). a coefficient 



setting means (4) and output means (6,7.8; 11-16). The 
coefficient setting means (4) enables the ainway of the 
output to be improved. Additional features include a 
temperature sensor, air pressure sensor calendar 
means to improve the airway further and to enable the 
output to be adaptive. 
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Description 

Thie present inyention relates to an interactive 
speech recognition device that recognises speech and 
produces sounds or actions in response to the recogni- 
tion result. 

One example of this kind of interaictive speech rec- 
ognition device is a speech recognition toy. For exam- 
ple, in the speech recognition toy disclosed in Japanese 
patent application No. H62-253p'93',' multiple iristruci- 
tions that will be used as speech instructions are pre- 
registered as recognitiori-target phrasets. The sjpeech 
signal issued by the child who is using the toy is conri: 
pared to the speech signals that have been registered, 
and when there is a match, the electrical signial pre- 
specified for the speech instruction is output and causes 
the toy to perform a specified action. 

However, in this type of conventional toys, such a^ 
stuffed toy anirnals, that issue phrases or perform spec- 
ified actbns based on the speech recpgnitiori result, the 
recognition result is often different from the actual word 
or phrase issued by the speaker; and even when the' 
recognition result is correct, the toys usually.cannot i'e- 
spond or return phrases that accommodate changes in 
the prevailing conditions or environrnent. 

Nowadays, sophisticated actions are required even^ 
of toys/ For example, a child willqulckly tire of a stuffed 
toy. animal if it responds, with "Good monning" when a 
child says 'Good morning" to it regardless. of the time of 
day Furthermore, because this type of interactive 
speech recojgnition technology possesses the potential 
of being applied to game macf^ines for older children or' 
even to consumer appliances and instruments, devel- 
opment of more, advanced technologies have been de- 
sired. . . 

ThiBrefore. an object of the present invention is to 
provide an interactive speech recognition device that' 
possesses a function for detecting changes In circum- 
stancies or environment, e.g., time of day. that can re- 
spond to the speech issued by the user by taking into' 
account the change in circumstances or envirohment, 
and that enables more sophisticated interactions. 

According to the present invention, there is provid- 
ed an interactive speech recognition device for recog- 
nising arid responding to input speech comprising: ' 

a speech analysing means for analysing "input 
. speech by comparing it to pre-registered speech 
patterns and for creating a speech data pattern; 
a speech recognition means for recognising the In- 
put speech by analysing the speech data pattern 
, and deriving recognition d^ta; . 
a speech output means for outputtirig a response 
to said input speech using said recognition data; 
and characterised by 

a coeifficient setting rrieans for generating weighted 
cojefficients for each of the pre-registered speech 
patterns and providing said speech recognition 



means with said coefficients thereby enabling the 
recognition data accuracy to be improved. 

The iht^raictiye; speech recognition device of tWe 
s present irivehtloh recognises input speech by analysing 
and comparing it to pre-registered speech patterns and 
responds to the recognised speech; and is character- 
ised in that it corhprises a speech analysis me^ns for 
creating a speech data pattern by analysing the inpuX 

10 speech; a variable data detection area for detecting var- 
iable date that affects the interaction content; a coeffi- 
cient setting means into which the variable data from 
said variable data detection area is input and that gen- 
erates a weighting coefficierit for each pre-registered 

'5 recognition target speech according to said variable da- 
ta; a speech recognition means into which the speech 
data pattern output by said speech analysis mearis is 
Iriput and that at the same time computes the final rec- 
ognition data by considisring the weighting coefficient 

20 assigned toHhe speech recognised al that time by ob- 
taining a weighting coefficient for each of^the multiple 
pre-i-egistered recognition target speeches from said 
coefficient setting means, recognises said input speech 
based on the corfi'puted final recognition data, and that 

2S outputs the final recognition data of the recognised 
speech; a speech syrithesis means for butputting syri- 
thesised speech data based on the final recognition data 
computed by said speech recbgnition means by cbnsid- 
eririg said coefficient; and a speech output means fbr 

30 butputtin'g the output of said speech synthesis means to 
the puts ide. = . 

" iSaid variable data detection means is. for example, 
a tinri in g means for detecting tinne data; and said coeffi- 
cie|nt settirig means generates a weighting coefficierit 

3S that corresponds to the time of day for each of the pre- 
registered recognition target speeches. In this case, the 
coefficieijit settirig rrieans can be corifigured tp output 
the lalrgesi weighting coefficient for the recognised data 
, if it occurs at the tirrie (peak time) when it was correctly 

'^0 recognised most frequently in the pasti and a srnaller 
weighting coefficient as the tinrie deviates frorn this peak 
time. - . ' 

Another embodiment of the interactive speech rec- 
pgn ition device of the inyention recognises input speech 

45 by anaiysihg.and comparing it to present pre-registered 
speech, patterns and responds to the recognised 
speech; and is characterised in that it corhprises a 
speech arialysis means for generating a speech data 
pattern by analysing the input speech; a speech re'cog- 

so nition means for outputting the recbgnition data that cor- 
responds to said input speech based on the speech data 
pattern output by said speech analysis meahs; a timing 
means for generating time data; a response content lev- 
el generation. means into which the time data from said 

5S timing means and at least one of the recognition count 
d^ta correctly recognised by said speech recognition 
nieans. are input, and that basecJ'pri the input data, gen- 
erates response content level fbr changing the response 
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content for the input speech; a response content level 
storage means for storing the response level that corre- 
sponds to the time obtained by said response content 
level generation means; a response content creation 
that determines the response content appropriate for 
the response level generated by said response content 
level gerieration means, based on the recognition data 
from said speech recognition area, and that outputs cor- 
responding to response content data; a speech synthe- 
sis means for outputting synthesised speech data that 
corresponds to the response content date, basied onthe 
response, content data from said response content cre- 
ation area; and a speech output means for outputting 
the output of .siaid speech synthesis rheans to the out- 
side. : . ....... , . . , , 

Stili another emt>pdiment of the interactive speech 
recognition device of^the^ present invention repognises 
input speech by analysing and cprhparing it Jo'pr^Treg^; 
istered. speech patterns and reisppnds to the recognised 
speech; and is characterised in that li cprhprises a 
speech, analysis means for generating a speech dala^ 
pattern by analysing the input speech; a sjpteech j-ecog- 
hitton means for outputting the recognitiori d^ta rtatcor-' 
responds to said input speech based oh the speech daj^^ 
pattern output by said speech analysis means; a yanp;, 
ble data detection, area for detecting variable data that 
affects the interaction content; a response Content cre- 
ation means into which the variable data f rdrri saicf var; 
iable data detection area and the recdgriition data from 
said speisch recognition area are input, and that ^ased* 
on said recognition data, outputs the response content 
data by taking said variable data into considerjatiion; a 
speech synthesis means for outputtirig' synthesised 
speech data iri response to the response coritetrit data 
output by said response coHriterit 'creation area; and^ a 
speech output means for outputting the output of 'said 
speech synthjBsis means [ \ \ -^r .^ , 

Said variabi^ data detectiori means is a tempera-' 
ture sensor that measures the t^hnperature of the u^ge 
envirohmeht and outputs the iemperature d^ta, eind said 
r^ponse coritent creation means outputs the response 
content data by taking said temperature data into conr^ 
sideratipn. 

Alternatively, said variable data detection meahs is 
ah air pressure temperature sensor that measures the 
air pressure of the ukage envirdnrhent and outputs the 
air pressure data, and said response content crieatibri 
rneans outputs the response content data by lakirig said 
air pressure data into consideration. . / 

Aitematively, said variable data detection means is 
a calendar det^tion rheans that cJeliscts calendar data 
and outputs the calendar data, and said response cori- 
tent creation means outputs the response content ciata 
by taking said calendar data into cpnside ration. 

the invention assigns a weighting coefficient to the 
recognition data of each of the pi-e-registered recogni- 
tion target speeches, based on the changes in the var- 
iable data (e.g., time of day.' temperature, weather, and 



date) that affects the content of the interaction. If time 
of data is used as the variable data, for example, a 
weighting coefficient can be assigned to each recogni- 
tion data of recognition target speeches according to the 
5 time of day, and speech recognition that considers the 
weighting coefficients can be performed by taking into 
consideration whether or not the phrase (in particular, a 
greeting phrase) issued by the speaker is appropriate 
for the time of day. therefore, even if the speech anal- 
10 ysis result shows that multiple recognition target 
speeches exist that jpossess a similar speech' pattern, 
weighting coefficients can increase the differences 
aniount the numerical values of the recognition data that 
are ultimately output,, thus improving the recognition 
IS rate. The same Is also true for other various types of 
variable data mentioned above, in addition to time of 
day. For example, if weighting coefficiisnts that corre- 
spond to the current temperature are set up, whether or 
not the greeting phrase issued by the speaker is appro- 
ve pri^te relative to the current tempiefature can be deter- 
rhiried Here again, even if the speech analysis result 
shows "that rfiuKipie recognition target speeches exist 
that possess" a isirhilar speech pattern, weighting coeffi- 
cients can increase the differences among the numeri- 
cs cal values of'the recognition data that are u^tim^elyo 
put, thus improving the recognition rate. 

' Furthenirtore. when time of day is used as the vari- 
aliie data, the relationship between phrases and times 
of day that matches actual usage can be obtained by 
30 detecting thie'tinnie of day at which a particular phrase is 
used rnost often and assigning a large vveighting coef- 
ficient to this peak tinne; and smaller weighting coeffi- 
eights to times of d^y that deviate farther from this peak 

time: ' ,*-..... . , / - - . . 

35 ' AdditionMlly, ''the "response content level can be 
changed in response to the speaker's phrase by gener- 
ating the response content level for changing the re- 
sponse content for the input speech as time passes, and 
by iissuing an appropriate riesponse by determining the 

40 reidponse content that matches said response level 
'* iDased oh the recbgfnitibn data from the speech recogni- 
tion area. 

Furthernrxire. by using data from instruments such 
as a temperature sensor or air pressure sensor, or var- 
45 iable datdi such as Calendar data, and creating the re- 
sponse content based on these data, the response con- 
tent can be varied widely, enabling nrwDre meaningful in- 
teractions. 

' Enribodiments of the present invention will now be 
so described with reference to the accompanying draw- 
ings; of which the invention is* explained in detail bdlow 
using working exarhples. Note that the invention has 
been applied to a toy in these working examples, and 
more particularly to a stutted toy dog intended for small 
ss children. ^ • ' 

' Rgure 1 is a block diagrami' showing the overall con- 
figuration of the stuffed toy dog of Working example 
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1 of the present invention; 

Figure 2 is a block diagram showing the overall con- 
figuration of Working example 2 of the present in- 
vention; 

Figure 3 is a block diagram showing the overall con- 
figuration of Working exarhple 3 of the present in- 
vention'; 

Figure 4 is a block diagram showing the overall con- 
figuration of Working example 4 of the invention; 

Figure 5 is a block diagram showing the overall con- 
figuration of Working example 5 of the present in- 
vention; and 

Figure 6 is a block diagram showing the overall con- 
figuration of Working example 6 of the invention. 

The invention is explained in detail below using 
working examiples. Note that the invention has been af)- 
plied to a toy in these working examples, and more par- 
ticularly to a stuffed toy dog Intended for small children' 

In working example 1 , weighting coefficients are set 
up for the recognition data of p re-registered recognition 
target speeches according to the value of the variable 
data (e.g., time of day, temperature, weather, and elate) 
that affects the interaction content, in order to improve 
the recognition rate when a greeting phrase is input. Fig- 
ure 1 is a cpnfigMration diagram that expliains Working 
example 1 oi the present invention. The configuration 
will be briefly explained first, and the individual functbns 
will be explained in detail later in the document. N6te 
that working example 1 uses time of day as said variable 
data that affects the content of the Interaction. 

In Figure 1 , the interior of stuffed toy dog 30 is prp; 
vided with a microphone 1 for entering speeches from 
outside. A speech analysis area 2 is provided for ana; 
lysing the speech that is input from said microphone J, 
and for generating a speech pattern thiat matches the 
characterisitics volume of the input speech. There is a 
clock area 5 which is a timing meahs,for,outputting tim- 
ing data such as the time at which said speech i^ input 
and the time at which this speech input is recognised by 
the speech recognition area described below. A coeffi- 
cient setting area 4 into which the time data frorn saicJ 
clock area 5 is input and that gerie rates weighting coef- 
ficients that change over time, in correspondence with 
the content of each recognition target speech. A speech 
recognition area 3 into which the speech data pattern of 
the input speech output by sajd speech analysis area 2 
is input, that at the same time obtains a weighting coef- 
ficient in effect for a registered recpgnitibn target speech 
at the time, from said speech recognition, area 4, that 
computes the final recognition data by multiplying the 
recognition data corresponding to each recognition tar- 
get speech by its corresponding weighting coefficient, 



6 

that recognises said input speech based on the comput- 
ed final recoghitbh data, and that outputs the final rec- 
ognition data bf the ffecognised speech. Speech synthe- 
sis area 6 'for outputting the speech synthesis data that 
5 corresponds lb the final recognition data recognised by 
taking said coefficient from said speech recognition area 
3 into consideration. Drive control area 7 for driving a 
rtiotbn mechanism 10 wtiich moves the nnouth. etc. of 
the stuffed toy 30 according" to the drive condition that 
10 are predetermined in correspondence to the recognition 
data recognised by said speech recognition area 3. A 
speaker 8 for outputting the content of the speech syn- 
thesised by said speech synthiesis area 6 to the outside. 
Finally, there'is a'poW^r supply area 9 for driving all of 
the above areas. ' " . . i . 

Said speech recognition area 3 in the example uses 
a neutral network' that handles a non-specific speaker, 
as its recognition means! However. thi3 recognition 
means is not limited to the method that handled a nort- 
specific speaker, and other knowri methods sudh as a 
method that handles' a specific speaker, DP matching, 
arid HMM, cari be used a^ the recognition means. 

in said niotioh mechanism 10, a rhotor 11 rotates 
t>ased on thei drive signal (which miatches the length of 
the output signal from speech synthesis area 6) output 
by drive control area 7, and vvhen cam 1 2 rotates in con- 
junction with motor Ilia protrusion-shaped rib 1 3 pro- 
vided oh cam 12 moves in a circular trace in conjunction 
with the rotatiori of cam 12. Crank 15 which uses axis 
14 as a f ulcrurti is clipped on rib 1 3, and moves a lower 
jaw 16 of the stuffed toy dog up and down synchronously 
with the rbtatiori of the cam 12. 

In this. configuration, the speech tbat is input from 
the microphone 1 is ianalysed by speech analysis area 
2, and a speech data pattern matching the characteristic 
yolume of the input speech is created, this speech data 
pattern is iriput into the input area of the neural network 
provided in speech recbgrlition area 3, ^nd is recog- 
nised as explained below. . ~ 

: |The explanation below is based on an example in 
Which" several greeting words or phrases are recog- 
nised, f^or example, greeting phrases isuch as "Good 
rriorning," "rrh leaving,' "Gbod day," "I'm home," and 
"good night" are used here for explanation. 

Suppose that a phrase "Good morning" issued by 
a non-specific speaker^ js input into microphone T. The 
characteristics of this speaker's "Good morning" are an- 
alysed by speech analysis area 2 and are input into 
speech recognitbn area 3 as a speech data pattern. 

At the same time at which the phrase "Good rnom- 
ihg" is input from nriicrophone'1 was detected as sound 
pressure, the data related to the tirine at which the 
phras0 "Good morriing" was recognised by the neural 
network of speech recognition area 3 is supplied from 
clock area 5 to coefficient setting area 4. Note that the 
time to be referenced by coefficient setting area 4 is the 
tirne the SfDeech was recognised by speech recognition 
area 3 in this case. ' ^' ~ 
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Said speech data pattern of "Good rnoming" that 
was input into the neural network of speech. recognition 
area 3 in this way is output froni Une output area of the 
neural network as recogniUon.data possessing a value, 
instead of binary data. Here, ari example in which this 
value is a nuniber between 0 and 1 0 possessing a float- 
ing point is used for explanatbn. 

Vyhen the speaker says "Good morning' to the 
stuffed toy 30. the neural network.pf speech recognition 
area 5 outputs a recognition data value of B O for 'Good 
morning/ 1.0 for "I'm leaving," 2.0 for "Good day," 1.0 
for "I'm horne." and 4.p for "Gopd night." The fact that 
the .recognition data Jro/Ti the " neural network for the 
speaker's "Good. rporning" isk high value df 8.0 is un- 
derstandable. The reason why the rebognition data val- 
ue for "Good night" is relatively high .compared to those, 
for "I'm leaving." "Gpod^day." fi^^^ is pre- 

sumed to be because the speech pattern data of "Qood 
rnbrnihg" ^d. "Good night" of a non^specjfic ipeak^r, 
ariatysed by speeclii analysiVariea 2, are^sbrriewhat sirn- 
itar to each other, f heref orei, atthpujgh the probability is 
hearty non-existent that the* speaker's "Gp^pd m^q^ 
will be recognised/as .^'m leaving^", "Gq^^ oir "I'm 
home," the probability is high that the speaker's "GopcJ 
morning" will be recognised as "^Ppd night." , . 

During this process, speech . recognition area 3 
fetches the. weighting coefficient p reassigned to a rec- 
ognition target speech by referencing coefficient setting 
area 4, and multiplies the recognition data by ^this co'ef- 
ficient. Because different greeting phrases are used de- 
pending ori the time of day. weighting coefficients are 
assigried to various greeting phrases based on the tirrie 
of day. For example, if the current time . is 7.00 arn. 1.0 
will be used as the weighting coefficient for "Good rnom- 
ing^ p.b for "I'm leaving," 0.7 for "Good, da^^^ 
home," and 0.5 for "Good riight," and'these relationships 
among recogniticri target speeches, tirpe of day, and. co- 
efficients are stored in coefficient setting area 4 in ad; 
vance. " ' . ... • 

When weighting coefficients are iised'in this way, 
the final recognition data of "Good rrvprning" will tie 8.0 
(i.e., 8.0 X.1.0) since the. recognition; data for "Good 
.morning" output by the neural network iis &.0 arid the 
coefficient tor "Good morning' af 7.00 a.rh. is 1 6. Like- 
wise, the final recognition data for Tm^ieaving" will be 
0.9 (i.e... 1.0 X p.9), the final recognition data fbr "Good 
day" will be 1 .4 (i.e., 2 0 X 0.7), the firial recognition data 
for "I'm horjie- will be 06 (i.e.. 1 0 X 0.6). and the finaj 
recognition data for "Good nigfit" will be 2.0 (i:e. 4.0 C 
0.5). In this way, speech recognition area 3 creates final 
recognition data by taking time-dependent weighting co- 
efficients into' consideration. 

When the final recpgriitiori data area deterrnined by 
taking tinie-dependent weighting coefficients into con^ 
side ration in this way. the final recognition data for 
"Good morning" is four tilnes larger than that for "Goc^ 
night." As a result, speech recognition area 3 cari accu- 
rately recognise the phrase "Good morning' when it is 



issued by the speaker. Note that the number of phrases 
that can be recognised can be set to any value. 

The final recognition, data of the phrase "Good 
morning" determined in this way is input into speech 

5 synthesis area 6 and drive control area 7. Speech syn- 
thesis area 6 converts the final recognition data from 
speech recognition area 3 to pre-determined speech 
synthesis datai and outputs that speech synthesis out- 
put from speaker 8. For example, "Good morning" will 

10 be output from speaker 8 in response to the final recog- 
nition data of the phrase "Good morning" in this case. 
That is, when the child playing with the stuffed toy says 
"Good moming' to the toy. the toy responds with "Good 
morning." This is because the phrase issued and the 

IS time of day match each other since the child says "Good 
morning" at 7.00 a.m. As a result "Good moming" is cor- 
rectly recognised and an appropriate response is re- 
tumed. 

At. the same time, drive control area 7 drives indi- 

20 vidual action mechanisms according to the drive condi- 
tions pre-determined for said final recognition data. 
Here, the mquth of the stuffed toy dog 30 is moved syn- 
chronously vtfith'the output signal ("Good morning" in 
this case) frorri speech synthesis area 6. Naturally, in 

2S additipri to rnoy ing the mouth of the stuff ed toy, ft is pos- 
sible to move any other areas, such as shaking the head 
or tail, tor example! 

Next, a case in which the current time is 8.00 p. rti. 
is explained, lr» this cases. 0.5 is set as the weighting cb- 

30 efficient for ''6obd morning," 0 6 for "I'm leaving," 0.7 for 
"Good day." 0.9 for "I'm honrie." and 1 .Ofor "Good night" 
When weighting coefficients are used In this way 
the final recognition data of "Gqod moming" vyill be 4.0 
(i.e.. 8.0 X 0.5) since the recognition data for "Good 

35 morning" output by the neural network is 8.0 and the 
weighting coefficient for "Good morning" at 8.00 p.m. is 
0.5. Likewise, the final recognition data for "I'm leaving" 
will be 0.6 (i.e.. 1.0 X 0.6)," the final recognition data for 
"Good day" will be 1.4 (i.e. 2.0 X 0.7), the final recogni- 

40 tioh data foi- "I'm home" vyili be 0,9 (i.e., 1 .0 X 0.9), and 
the final recognitibn data for "Good night" will be 4.0 (i. 
4.0 X 1-0). 

In this way. speech recognition area 3 creates final 
rebpgnitibh data by taking weighting coefficients into 

45 consioeration. Since the final recognition data for both 
"Good 'morning" and "Good night" are '4.0. the two 
phrases cannot be differentiated. In other words, when 
the speaker says "Good morning' at 8.00 p.m., it is not 
possible to determine whether the phrase is "Good 

so moming" or "Good night." 

This final recogniiion data is supplied to speiech 
synthesis area 6 and drive control area 7, both of which 
act accordingly- That is speech synthesisi area 6 con- 
verts the fihal 'recognitiori data to a pre-determtned am- 

55 biguous speech synthesis data and outputs it. For ex- 
ample. "Something is funny here! " is output from speak- 
er fe, indicating that "Gbocl, morning" is not appropriate 
for use at night fime. 
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At the same time, drive control area 7 drives indi- 
vidual action mechanisms accordirig to the drive condi- 
tions ,pre-determined , ifor said firial recognition data. 
Here, the mouth of the stuffed toy dog is ririoved syh- 
chroriously with the output signal, ("Sornething is funny s 
here!" in this case) from speech syrithesis area 3. Nat- 
urally, in addition to moving the mouth of the stuffed toy! 
it is possible to move any otheir.iareas, as in. the case 
above. 

Next, acase in which the speaker says "Good night" io 
when the current time if 8.00 p.m. is explained In this 
case, it is assumed that the neural network of speech 
recognition area 3 outjDuts a recognition data value of 
4.0 for •Good. morning,* 1:0 for "I'm leaving.* 2.0 for 
■Good day," 1 ,0 for "I'm home," and 8.0 for "Good night! is 
" When the current time is 8.00 pm, 0.5 will be gsecJ as 
the yveighting coefficient for "Gcxxi morning,' 0.6 for Trn 
leaving." 0.7 for "Good day,".0.9 for "I'm home," and 1 .0 
for "Good night, ". When weighting coefficients. are used 
iri this way, the final recognition data of 'Good rfiomihg" 20 
will be 2.0 (i.e., 4.0 X 6,5) since the recogrijtipn ciata for 
■Good morning" output by the neural network is 4.0 and 
the weighting coefficient for "Good morning" at SjOO^pm 
is 0.5. Ljkewise, the firial recognition ciata for "I'm leav- 
ing" will be 0.9 (i.e. 1 .0 X p"9), the final recognition data 2S 
for "Good day' will be 1 .4 (i.e./2.p Xp.7): the final rep-; 
ognition data for "I'm home" will be 0.6 (I.e. 1.6 X 0,6), 
and the final recognition data for "Good night** will be 8,0 
(i.e., 8.0 X 1.0). In. this way, speech recognition area 3 . 
creates final recognition data by taking weighting coefr 30 
ficients intaconsideratiori. . 

When the final recognition data is determined by 
taking time-related informatjon into consideration, in this 
way. the firial recognition data for "Good night" is Your . . 
time§. larger than that for "Good morning ". As a result. 3S 
speech recognition area 3 can accurately recognise the 
phrase "Good night" wrtien it is issued by the speaker, : 
The final.recognition data of. the phrase 'Qqo^ 
night" determined in this way is input into speech synr . 
thesis area 6 and device control area 7, Speech synthe- 40 
sis. area. 6 converts the final recognition data frorri 
speech recognition area 5 to pre-determined speech 
synthesis data, and outputs that speech synthesis out- 
put from>speaker 7. For example. "Good night" will be 
output from speaker 8 in response to the final, recpgni- ^ 
tion data of the phrase "Good night" in this case. 

. Although the, response from the stuffed toy 30 is 
"Good morning" or "Good night" in response to the 
speaker's "Good morning" or "Good night", respectively, 
in the above explanation, it is possible^to set many kinds so 
of phrases as the resporise. For exarnple, "You're up 
early today" can be used in response to "Good morning. 

Furthermore, although the Wme of day jwas used as 
the variable data for setting weightipgt coefficients iri ss 
Working example 1, it is also possible to. set weighting 
coefficients based on other data such as temperature, 
weather, and date. For example, if temperature is used 



as the variable data, temperature data is detected from 
a temperaful'e serisor that measures the air tempera- 
ture, and weightirigiqoe^^^ are assigned to the rec- 
ognition data for wiBBther^related greeting phrases (e. 
g., "It's hot, isn't it?" or "It's cold. isnVit?") that are input 
and to other registered recoghition data. In this way, the 
difference in the values of the two recoghition data is 
rriagnified by their vveighting coefficients even ' if a 
speech data pattern that is similar to the input* speech 
exists, thus increasirig the recdghitioh rate, ;Further- 
rripre, if ai combination of variable data such as time of 
day, tenriperatijre, weather, ^and date, is used and 
weighting cbetficientis are assigned to thisse variable da- 
ta, the recognition rate for yarious.gre^^ phrases can 
be iricreased even furthe^^^ ' , 

WoH<ing example 2) ' ^ ' 

hJext, Workirig exannp^^ 2 "of the preserit thyehtibh 
will be explained with re^^^^ Note that 

the stuffed toy dog, 30, motion meclianisin 1 0 for moving 
the rriouthpf the stuffed. toy, etc. aire omitted from Figure 
2. Figui-e 2 is' different if rom Fi^^^ in that a memory 
area 21 is provided for storing the weighting coefficients 
for i'bcpgnisable phrasos^that are set by coefficient set- 
ting arpa 4 according Jo time data. Since' all other con- 
figuration elements are I'deritical as in Fjgure i, like sym- 
bols. are used to. represent like parts. The pi-ocessing 
between fhempry area 21 and coefficient setting area 4 
wiji be explained later. _ / 

... Jri Figyre. 2, the speech that is input from micro- 
phone 1 is ahafysed by speech analysis areai 2. and a 
$peeph data pattern matching the characteristic yolurrie 
of thejnput speech is created.' This speech data jDattern 
is input into the iriput area pH network provided 

in speech recognition ar€>a 3, jarid js recpgnjsed as ex- 
plained below . ' ^ ' / ' . . . 
^ . The exp lanatiori' below, is based on an examp le in 
which several gre.^tirig vyords or phrases are riecpg- 
rii^ed. For exarriplei gre,et in g phrases, such as "Gpod 
morning," 'Vm leaving," "Good day." "I'm home." and 
"Good night" are used here for explanation. 
, . . guppose that a phrase "Good morning" issued by 
a non-specific speaker is input intp microphone 1 . The 
pharabieristics of this speaker's "Good morning" are an- 
alysed ^by. speech analysis area, 2. and are input into 
speech recognition area 3 as a. speech data pattern. 

, .At the same time at which the phrase "Good morri- 
ing" input from microphone 1 weis detected as sound 
pressure, the data related to the' time at which the 
phrase "Good rriorning" is>ecognised by the neural. net- 
work of speech recognition "area 3 was supplied from 
clock area 5 to coefficient setting area 4. Note that the 
time to be referenced by coefficient setting area 4 is the 
time the speech was recognised by speech recognition 
ar.ea 3 in this case, , ; 

. Said, speech data pattern of "Good rnorning" that 
was input into the neural network of speech recognition 
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area 3 in this way is output from the output area of the 
neural network as recognition data possessing a value 
instead of binary data. Here, an example in' which this 
value is a number between 0 ancJ 1 0 pcesessirig a float- 
ing point is used for explanatbn. 

When the speaker says "Good morning" to the 
stuffed toy 30, the neural network of speech recognition 
area 3 outputs a. recognitic^ data value of 8.0 for "Good 
moming.'. 1 .0 for Tm leaving/ 2.0 for "Good day. 1 *0 for 
Tm home," and 4.0 for "Good night: " The fact that the 
recognition data from the neural network for the speak- 
er's 'Gciod morning" is a high value of 8.0 is understand- 
able. The reason why the recoignjtipn data value for 
"Good night" is relatively high cpmpared to those for'i'rn 
leaving," "Good day." knd "I'm.home" is presumed to be 
because the speech pattern data of "Good morning" and 
"Good night' of a non-specific speaker, analysed by 
speech analysis area 2, are somewhat similar to each 
other. Therefore, although the probability nearlv^non- 
existent that the speaker's "Good morning" will be rec- 
ognised as Tni leaving." "Good day," or 'I'm home/ the 
probability is high that the speaker's "Good morn ing' will 
be recognised as "Good nighi," Up tothis point; Working 
example 2 is nearly identical to Working exarnple 1.' 

Speech recognition area 3 f etches the weighting co- 
efficient assigned to a recognition target speech accord- 
ing to time data by referencing coefficient setting aiea 
4. However, in Workirig exarnple 2. memory area 21; is 
connected to coefficient setting area"4, land the'conterii 
(weighting coefficients) stored in rfiemqry area 21 is ref- 
erenced by coefficient setting area^fl. Note t'nat cpeffi- 
cient setting area 4 outputs a large weightirig coefficient 
to the recognition data of a phrase if the phrase occurs 
'at the tirne of day it was most f requently recogriised, and 
outputs a smaller weightinig coefificieht tbt^ie recogiiitidn 
data of the phrase as the phrase occurs awky f rbrn said 
tirifie df day. In other Words, the largest weighting coei- 
ficient is assigned to the recognition data when the 
phrase occurs at the time pf day with tiie highest usage 
frequency, and a srnaller weightirig coefficient is as- 
signed tot he recognition data'as the phrase ocxiJirs 
away from said time of day.' . ** . . 

For example, if it is assurned that the current tirine 
is 7.00 am, and that 1 .6 is used as the initial weighting 
coefficient for 'Good mornirig," 0.9 for Tvn leaving." 0.7 
for "Good day." 0.6 for "I'm honne,* and 0.5 for 'Gobd 
night," and these coefficients are stored in rhemory area 
21, the final recognilion data of "Good morning" vyill be 
8.0 (i.e., 8.0 X I .0) since the recognition datalor "Good 
morning" output by the neural network is 8.0 iand the 
coefficient for "Good nriorning" fetched from mehrlofy ar- 
ea 21 at 7.00 arti is 1.0. Likewise, the final recognition 
data will be 0.9 for 'Vm leaving/ 1 .4 for "Good'day." 0.6 
for "I'm honnie," and 2.0 for "Good night.' these final rec- 
ognition data are initially created by speech recognition 
area 3. 

Even when recognition is performed by taking into 
consideration the weightirig coefficient based' on the 
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time of day. there is some range of time in which a cer- 
tain phrase will be correctly recognised. For example, 
the phrase "Good moming" may be correctly recognised 
at 7.00 a.m.. 7.30 am, or 8.00 a.m. By takirig this factor 
5 into consideration, memory area 21 stores the largest 
weighting coefficient for; a phrase when it occurs at the 
time of day with the highest usage frequency based on 
the time data for recognising that phrase in the past, and 
stores a smaller weighting coefficient for the phrase as 
10 it occurs away from said time of day. 

For example, if the'phrase "Good moming" was 
most frequently recognised at 7.00 am according to the 
past statistics, the coefficient to be applied to the recog- 
nition data of "Good morning" is set the largest when the 
IS time data indicates 7.00 a.m., and smaller as the time 
data deviates farther away from 7.00 a.m. That is*, the 
cbefhcieht is set at 1.0 for 7.00 a.m., 0.9 for 8.00 am, 
and 0,8 for 9.00 a.m.. for example. The time data used 
tor setting coefficients is statisticalty created based on 
20 several past lime data instead of just one time data. Note 
that the coefficients during the initial setting are set to 
startdafrci values tor pre-determined times* of day. That 
is, in the'ihitiial state, the weighting coefficient for "Good 
moming" at 7.OO a.m. is set to 1 .0. 
2S The coefficient of the "Gipod morning" that is most 
recently recogiiised is input into memory area 21 as a 
new coefficient data along with the time data/ arid rhem- 
oryarea 21 updates the coefficient for the phrase based 
on this data and past data as needed, 
30 By making the coefficient for a phrase the largest at 
the time of day when it is used most frequently; when 
the phrase "Good nrraming" is issued at around 7 .00 am, 
the final recognition data of "Good morning" will be 8.0 
(i.e.. 8.0 X 1.0) since the recognition data for "Good 
55 morning" output by the neural network is 8.0 and the 
coefficient for 'Good nr>orhing" fetched f roni mernory ar- 
ea 21 kt 7^00 am is 1.0. Since this final recognition data 
is at least four times larger than those of other phrases, 
the phrase "Good moming" is correctly recognised by 
40 Speech recognitbn area 3. ' 
' ■ The final recognilion data of the phrase '"Good 
morning" deterrniried in this way is input into speech 
synthesis area 6 and drive control area 7. Speech syn- 
thesis area 6 conve.ts the final recognition data from 
45 speech recognition area 3 tb pre-detenrhined speech 
syntliesis data, and a pre-set phrase such as 'Good 
moming" or 'Vou'ire up early today' is returned throiigh 
speaker 8 embedded in the body of the stuffed toy dog, 
as a resjponse to the speaker's 'Good morning." 
so On the other hand, if "Good moming" is issued at 
around 1 2 noon, the' coefficient for "Good moming" be- 
comes srhall, making the final recognition data for "Good' 
morning" small, and "Good 'morning" will not be recog- 
nised. In such a case, speech synthesis area 6 is pro- 
55 grammed to issue a corresponding phrase as in Working 
ekample 1 , and a response such as "Sorhethirig is funny 
here!"' is issued by'stuffed toy 30. 
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(Working example -3) 

Next. Working example 3 of the present invention 
will be explained with reference to Figure 3., Note that 
the stuffed toy dog 30. the motion mechanisn^ 10 for 
moving the mouth of the stuffed toy, etc. shown in Figure 
1 are omitted from Figure 3. Working example 3 is pro- 
vided with microphone 1 for entering speeches fromout- 
side,.speech analysis area 2 for arialysing the speech 
that is input fronri said microphone 1 and for generating 
a speech pattern that matches the characteristic volume 
of the input speech; clock area 5 for outputting timing 
data; speech recognition area 3 for outputting the rec; 
ognition data for said input speech based on the speech 
data pattern output by said, speech analysis area 2; 
speech synthesis area 6 for, outputting the speech syn- 
thesis data that corresponds to the final recognition data 
recognised by taking said coefficient from said speech 
recognition area 3Jnio consideration; drive.control ^rea 
7 for the driving molion mechanism 10 (see- Figure .1) 
which moves the mouth, , etc, of the stuffed} toy 30 ac- 
cording lo the drive condition that .are predetermined Jn 
correspondence to the recognition data recognised by 
said speech recognition area 3;,speaker 8 for outputting 
the content of Ihc speech eynthesised'by said speech 
synthesis aroa e xo the outside; and power supply j area 
9 for driving all of the above areas;; and Js additionally 
provided with response content level generation are^ 
31; respprise conleni level storage area: 32; and re- 
sporise content creation area 33. , , - - , : i 

Said speech recognition area 3.in the exarnple uses 
a neural network that handles a non-specific speaker, 
as its recognition, rneans. However, the recognition 
means is not limited to the method that handjes a nop; 
specific speakor^ and other Known methods -such as a 
mettx>d that handles a specific speaker, DP^ matching, 
and HMM, can bemused as the recognition means.. 

Said, resporisc content level generation area 31 
gerierates response level values for increasing the level 
of response 'content as time passes or as the number of 
recognitions by speech recognition area 3 increases^; 
Response content level storage area 32 stores the reT 
lationship between the response level gerjie rated; by re- 
sponse content level generation area:31 and^time^ That 
is, the relationship between the passage of time and lev- 
el value is stored e g . level T when the activation switch 
is tumed on for the lirst time after the stuffed toy is pur7 
chased, level 2 allei 24 hours pass, and level Scatter 24 
more hours pass ' • , 

When it receives the final recognition data from 
speech recognition, area 3. said response content crea- 
tion area 33 references said response content level gen- 
eration area 31 and determines response, content that 
corresponds to the, response. control level value.; During 
this process^ response content level generation area 31 
fetches the response content level that corresponds to 
the time data from response content level storage area 
32. For example, response content level 1 is fetched if 



the current time is within the first 24 hours after the 
switch, was jturned^on for the first time, and level 2 is 
fetched if ^the^cd is between 24th and 48th 

hours._ '"."i ;^r^ r>t'" '- 
s Response content creation area 33 then createis 
recognition data possessing the response content that 
corresponds tot he fetched response content level, 
based oh the recognition data from ispeech recognition 
area 3, For example, "Bow-wow" is returned for. recog- 

10 nition data 'Good morning' when the response content 
level (hereafter simply referred to as "level") is f, broken 
■G-o-o<J rnor-ning" for level 2, "Good morning* for level 
3[ and "Good morning. It;s a nice dayi isn't it ? for a high- 
er level n. In this way, both the, rjesponse conterit and 
level are increased as time passes. The response data 
created by said response content creation area 33 is 
syntheisised into speech by speech synthesis area 6 and 
is output from speaker 8. ; . , , I \ , 
^ h; Suppose that a phrase "30001 ririprning' issued by 

20 a non-specific ^speaker is input into rnicrpphone i . The 
characteristics of this .speaker's,"Gobd rnorning" are an- 
alysed by rspeech. analysis 'area 2^an6^are input into 
speech, recogn it jori ,area 3 as a speech d^ta patte rn : 
- ,. Said, speech data pattern of "Good monriing" that 

^5 was input into the neural network of speech recognition 
area 3^ in this way is,9utput from the output area of the 
neural network as a recognition data possessing a val- 
uej instead of a binary data. If the recognition data for 

" . the, phrase, "Good. rnprnjng" is higher than those recog- 

30 nition data for other phrases, speech recognition area, 3 
correctly recognises the speaker's "Good naoming" as 
"Good nnorning," . . : , - \, 

. The recognition data.for the phrase "Good morning" 
r thus identified isJnput Jnto response content creation ar- 

35 ea 33. ^Response content creation area 33 thea deter- 
miries the response cqntent forthejinput recpgnitidn da- 
ta, based on the input recogn itipn data and the content 
pf response content level generatiori area 31 . ^ 

J ; J As explained abpye, the response level value from 

40 said resp9nse, ppntent level generation area 31 is used 
for gradually increasing the ieve! of response content jn 
response to the phrase issued by the speaker;, and in 
this case, the level is increased as time passes based 
on the time data of clock area 3. However, it is also pos- 

45 sible to change the level value based on the number or 
types of phrases recognised, instead of the passage of 
time. Alternatively, it is possil?le tp change the level val- 
ue based on lhe| combination pf the passage of time and 

r the number or types, of phrases recognised. 

so V yVorking example J3 is characterised in .that it pro- 
vkJes an illusion that the stuffed toy is growing up like, a 
living creature aS;tinrie passes. In other words, the 
stuffed toy can only respond with "Bow-wow" to "Qood 
morning" on the first,day after being purchased because 

55 the response level is only .1 However, on the second 
day, it , can , respond yyith."G-p-o-d- morning" to "Good 
jiTiorning" on the second day because the. response, I eve I 
is 2. Furthermore, after several days, the stuffed toy can 
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respond with "It's a nice day. isnt it?" to "Good morning" 
because of"a higher level. 

Although the unit of time for increasing the response 
coritent by one level was set at'l '2SyT24 Hours) in the 
above explanation, the unit is not limited to 1 day. and 
it is possible to use a Idnger'or sWoVter time span for 
increasing the level. Note that it will be possible to any 
reset level ihcrea>e if a reset switch for resetting the lev- 
el is prdvided!^ For example, it will tie possible to reset 
the level b^ck to the inhial vilue when level 3 Has been 
reached,"'.;,/ . '* ; / / 'T \'" f ■ * 

AUhbugW th^.abdye expianatibri wais provided for 
the ire^f)onse to the phVase 'Good morriing/ iHs ro^^ 
ited to "Good morning* arid is natuiBlly applicable to up- 
grading of resp6ns€^ to 6ther pKra'ses'sueAi as ■6b6£l 
night" and "I'm lea[ying. " Ta^^ •Good night' f^^ exanipld! 
The coritent of this response ffoiti tKe 'stufleb ioy W ifeply 
to "Good night" can^be changed from' "Unn-unn* ({>yppy 
cry) in level 1 . to ;;G<)-OHd nigh-t; InMe^^ ' 
* . By ihcreasirtg Ih^^teVel pf f^pdnse' conle^^^ this 
way. the stuffed toy dog ca be made to'kppfeir'^td b§ 
changing the^content of its Tespt)hs(^ as it growsrThe 
toy can then be^made to act likfe^a fiVing creat ure by iriak- 
ing it respond differehtty as tfme passes 'eVeri whdn trie 
same phrase "Good morning 6 recdghisfed. Further 
more, the toy is not liorihg becadse it responds \Anth'd li- 
ferent phrases even wheri the speaker says the/same 
thing." i-v ' : ' ■ ^f^'^ / /^^'^ 

Working example 3 is* also -asefdrifor training th'^ 
speaker to find out the best v\^y to speakto'the toy ih 
order to obtain a high recogrlitibh rate when the' to/s 
response content level value is stiil Io_fe That is*. Vihenl 
the speaker does not pronounce "Good mdrnihg" in a 
correct way/ the "Good morning" will hdt be' easily recy 
ogn ised. bften resultirtg' in a low recoghitibh 'fatevHi^^ 
ever; if the toy responds wittr" Bow-wow" to "Gbdklrnom^ 
irig." this rheins that the "Good mbrhing" was cbrr^lctfy 
rebbgni^ed. therefbre, if the speaker prkcticies to s'ffeak 
in a recognisabib manner early bh, the speakbr Ibahis 
how to speak to be recognised. Consbquiently; the 
speaker's phrases will be recognised at hig'h'rates even 
when the response content level value increasesi WsuU- 
ing in smooth interactibhs. - ^ ' * " ' 

{\A/orking example 4) ' " ' ^ — * ' * 

Nbxt. Working example 4 of the ilnventior. wifl be ex- 
plained with reference to Figurb 4. Note that stuffed toy 
dbg 30, motion mechahisrn lip for rnovihg* the nioi!rtn of 
the stuffed toy. etc. shown in- ^gu re 1 'are drnitted from 
Figure 4. in Working example 4^emperature is detected 
as one of the variable data that ^affect the interaction, 
and the change in temperature is used for changing the 
content of the response from response content creation 
area 33 ishown in Working example 3 above? Tempera- 
ture sensor 34 is provided'in Figure 4,- and like s'yrhlx>ls 
are used to represent'^like parts as in Figure 3. -Wheri it 
' receives the recognition tteta from speecti recognitibh 



area 3, said response content creation area 33 deter- 
mines the response content for stuffed toy 30 based on 
the recognition data and the temperature data from tem- 
perature sensor 34. The specific processing details will 
5 be explained later in the document." 

In Figure 4. the speech that is input from micro- 
phone V is analysed by speech aria ty sis area 2. and a 
speech data pattern matching the characteristic volume 
of the input 'speech is created. This speech data pattern 
10 is input into the input area of the neural network provided 
in speech recognition area 3, and is recognised as a 
speech. ' . . . . - 

Suppose that a phrase "Good morning" issued by 
a non-specific speaker is input into microphone 1 . The 
^5 characteristics of this speaker's "God morning" are an- 
' aiysed by speech analysis area 2 and are input into 
speech recognitbn area 3 as a speech data pattern. 

Said speech data pattern of "Good morning" that 
was input into the rieural network of speech recognition 
20 area 3 in this way is output from the output area of the 
' ' heiiral network as a recognition data possessing a val- 
ue- tnstead'of k Binary data. If the recognition data for 
the phrase "Good rnbmirig* is higher than those recog- 
nition data f broth isr phrases, speech recogriition area 3 
2S cbri-ectly recognises the speaker's "Good nfK>rning" as 
•isood rrio'rning." ' - ' * 

^ The recognitidn data for the phrase "Good morning" 
thus^Vecognised is input into response content creation 
area 3i3. Respbrise content creation area 33 then deter- 
ge mines the response content for the input recognitibh da- 
ta, based on the input recognition data and the temper- 
ature data frorh temperature sensor 34. * 

Therefore, the data content of the respbnse to the 
recognition data that is output by speech recognition ar- 
35 ea 3 c^h be created acbbrd in g to the current ternpera- 
turel For example, suppose that the speaker's "Good 
rTKDrning" is correcVI/ recognised by speech recognition* 
area 3 as "Good rrVorningi" Response content creation 
area 33 then creates response data "Good morning. It's 
40 abitcoldi isn't it?" in reply to the recognition data "Good; 
morning" if the currieht temperature is low. On the other 
fend, response data "Good morning. Jt's a bit hot; isn't 
it?" is created- in reply to the same recognition data 
"Good mbmirig" if the cunent temperature is higher The 
45 response data related by response ccntent creation ar- 
ea 33 lis input-into speech synthesis ariea 6 and drive 
control area 7. The speech data input into synthesis ar-- 
ea 6 is converted into' speech synthesis data, and is cul-^ 
put by speaker 8 embedded in the t>ody of the stuffed 
so toy dog. The recognition data input into drive control ar- 
ea 7 drives motion 'mechanism 1 0 (see Figure 1 ) accord- 
ing to the corresporiding pre-determined drive condition 
and moves the mouth of the stuffed toy while the re- 
sponse is being issued. - * - • 
ss r irf this way, the stuffed toy dog can ^be made to be- 
^ have as if it sensed a change in the temperature in its 
ehvironrhent and rasponded accordingly. The*toy can 
then be made to act lika 'a-living creature by nr^aking it 
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respond differently^ as the surrounding temperature 
changes even when the same phrase "Good moming" 
is recognised. Furthermore, the toy is not boring be- 
cause it responds with different phrases even yvhen the 
speaker says the same^ thing. , - , 

Working example 5) , , . , . 

Next, Working example 5 of the invention will be ex- 
plained with reference to Figure 5. Note that stuffed toy 
dog 30, motion mechanism 1 0 for moving the mouth of 
the. stuffed toy, etc. shown in Figure 1 are omitted from 
Figure 5. In Working example 5, air pressure is detected 
as one of the variable data that affect the interaction,^ 
and the change in air pressure (good or bad weather) is 
used for changing the content, of the responsie from re-, 
sponse content creation area 33 shown in \/yorking ex^ 
ample 3 above. Air pressure sensor 35 is ^provided in 
Figure 5. and like symbols are used to represent lika 
parts as in Figure 3, Said response content creation ar-. 
ea .33 receives the recognition data from speech recogr^ 
nition area 3, and determines the response content for 
the stuffed toy based on the .recognition data and the air 
pressure from air pressure sensor 35 and the specific, 
processing details will be explained later in the docu-> 
ment, • . ■ ^ , ' 

In Figure 5, the speech that, is input from micro-, 
phone 1 is analysed by speech analysis ^area 2, and a 
speech data pattern rrtatching the characteristic volume 
of the input speech is created. This speech data pattern 
is input into the input area of the neural network provided 
in speech recognition area 3. and is recognised as 
speech,, . - > ... . 

t J Suppose that a phrase "Good morning" issued by 
a non-specific speaker is input into microphone 1. The 
characteristics of this speaker's "Good moming" are an- 
alysed by speech analysis area 2 and are input into 
speech.recognition area 3 as a speech data pattern. 

' Said speech data pattern of "Good moming" that 
was input into the neural network of speech recognition 
area 3 in this way is; output from the output area of the 
neural network as a recognition data possessing a val- 
ue., instead of a binary data. It the, recognition data for 
the phrase. "Good morning" is higher than,those recog- 
nition data for other phrases, speech recognition area 3 
correctly recognises the speaker's TGood moming" as 
"Good moming." , . : 

The recognition data for the phrase "Good morning" 
thus recognised is input into response content creation 
area 33. Response content creation area 33 then deterr 
mines the response content for the input recognition da- 
ta, based on the input recognition data and.the air pres- 
sure data from air pressure, sensor 3. - 

' Therefore, the data content of the response to the 
recognition data that is output by speech recognition ar- 
ea 3 can be created according to the current air pres- 
sure. For example, suppose that the speaker's "Good 
morning is correctly recognised by speech recognition 



area 3 as "Good morning." Response content creation 
area 33 then creates.response data "Good morning, the 
weather is going to gW in reply to the red- 

ognitioo data/Gppa m^^^ if the air pressu/e has ifalf- 
5 en. On the other hand, response data "Good morning. 
The weather is going to get better today " is createcJ in 
reply to the recognition data "Good morning" if the air 
pressure has risen. The response data created by re- 
sponse content creation area 33 is input into speecfi 
10 synthesis area 6 and drive .control area 7. The speech 
data input into synthesis area 6 is converted into speech 
synthesis data, and is output by speaker 8 enriBedded 
in the, body of^the stuffed toy dog. . Jhe recognition data 
input into, drive control area 7 drives mption mec han ism 
IS 1 p (see Figure J ) accprding to the corresponding pre- 
deternritned drive condition and nnoves the mouth of the 
stuffed toy vvhile the respipnse is being is^ 
/ jn.this way,,the s.tuffed toy dog can be made to be- 
have as if it sensiBd a change in the iiir. pressure in its 
environment, and responded accbrdingly The toy can 
then be made, to act Hke,a living creature by making it 
reispond differently as the air pressure chariges even 
wh^n. the same phrase "Good morning" . is recognised! 
Furthermore, the toy is not boring because .it' responds 
yyjth different phrases even when the speaker says the 
^arne thing:^ . , . , 

(forking exarnple 6) . . 

, Next, Working examiDle 6 of the invention will be ex-; 
ptained wijh reference to Figurd 6. Note that stuffed toy 
dog 30. motion mechanism 10 for,rnoyihg the rriouth of 
ttie stuffed toy, etc. shown, in Figure l.are^pmitted from 
Figure 6. In Working example 6, calendar data is detect- 
ed as^one of the variable data that affect the interaction, 
and the change in calendardata (change in date) is used 
fpr changing.the.,conter>t of the response, the, configu- 
ration in Figure, 6 .is different from those in Figures 4 and 
5. in that calendar area 36^ is provided in place of teni 
perature sensor 34 or air pressure sensor 35,' end like 
symbols are used to represent like parts as in Figures 
4 pr 5, Note that said calendar area 36 updates the cal- 
endar by referencing tfie tipne data from th6, clock area 
(not shown in the figure). Response content creation ar- 
ea ,33 in Working exar[iple;6 receives speech recogni- 
tion data froiTV speech; i:ec9gnition area 3, and !deter- 
mines the response content for the stuffed toy based 'on 
the recognition data and the calendar data trom ipaien- 
dar area 36. The specific processing details wij I be, ex- 
plained later in the document ; 

. In Figure 6, the. speech that is input.,frorn nriicro- 
phone 1 is analysed by speech analysis area 2,^and a 
speech data pattern nriatching the characteristic volume 
of .the input speech is created, Jhis speech ciata pattern 
isJnputJnto the input area of the neural network provided 
in speech recognition area 3 and is. rjecognised as a 
speech.: _ ^ ..j; , 

Suppose that a phrase "Good morning" issued by 
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a non-specific speaker is input into microphone 1 . The 
characteristics of this speaker's "Good morning" are an- 
alysed by speech anaJysis area jJ and ar^' i^^^ into 
speech recognition area 3 as a speech d^t^^^ 

Said speech data pattern of 'eS8S' pftomihg^ 
was injput into the neural network prbfj^^efch recognition^ 
area 3 in this way is output from the output area of the 
rieurat network as a recognition data possessing a val- 
ue, instead of a binary data. If the recognition data for 
the phrase "Good morning" is higher than those recog- 
nitbn data for other phrases, speech recognition area 3 
correctly recognises the speaker's "Good momihg" as* 
"Good moming." / 1. . . , i/''' 

The recognition data for the ph rase "GgSxI mbrriing'' 
thbs recognised is input into re^pdn^^ cont'ent create 
area 33. Response content creation krea "33 tKen&^^ 
mines the response contenttor theMhpui reco^riiti 
ta based on the input recognftibh data arid the ratferidar 
data (date informatiori which i^n j^^^^ yedr data)^ 

trom calendar area 36. * ' ' ' ^' ^ ' . 

Therefore, the datk^ content of tfie^fjb'^^ 
recognition data that is^ biitput by spiB^cl^ re^^^ 
ea 5 can be created accord irfg td'^the current date: f^or 
example, suppose that the sp^^akeVs ^Gocid mdrniri^ i| 
correctly recognised by speech ^recDgfiitioh airea' S as 
"Good moming J Response cbnt&nt creation ariek *33' 
then creates response data "Good moming. Pfease't^lo^ 
me to cherry blossom viewing. " in reply to the recogrii- 
tion data "Good nrorning" if the calenid^t data^sffow^^ 
April 1 . On the other hand, response data "Good riDorn- 
ing. Christrnas is coming soon." id created in r^eply to the 
same I'ecognitlon data "Good molnriihg" ff the cale^ 
data stibvi/s December 23! Naturally/ it iis pbs;^ibie to6re-J 
ate a response that is diff'ereni from thepi;e^iousy6£fr if 
the year data is available. ■ \ J . V V * 

The response data 'created by* response ^ 
crefatibh arek 33 is input irjtd speech syrithesi's are^ 6 
and driVe control area7. Tiie kpt^^^ch^ia input'thto''syn- 
thesis area 6 is coriverte'd into/speech synthesis dd^^ 
and is output by speaker e'erhtjedcJed'in the^^b^ of'the 
stuffed toy dbg. The recognition* data input irito'cJHVe 
control area 7 drives nribtic^ niec*iani3rn lOXseeFigiire 
1) according to the corresponding pre-determiiied'dKve^ 
conditiori arid moves the nruiuth ot ih'e stuffed tby while 
the respofise is being iss ued. * "'^ ' ; ' * " " / " 

' In this- way, the stufled toy dbg can be made tbbe^ 
haye as if it sensed a change in the date and responded 
acbor'dingly. The toy can llieii oy made to act like a living 
creature by^nriaking it respond differently as tffe date 
changes even when the saiirie phrase "Giood morning" 
is recognised. Furthermore, the toy is not boring be- 
cause it responds with* different phrases 'even when the 
speakei' says the same thirig.' - ' - 

' Although several working exarnples were used for 
explaining the presertt invention, the inventiori can be 
widely applied to electronic instruments -that are used 
daily, such as personal digitkl assistants and interabtiva 
games, in addition to toys. Furthermore, in the third and 



subsequent worlcing examples, speech recognition area 
3 can obtain the final recognition that using weighting 
coefficients that take into consideration the appropriate- 
ness of the content of the speaker's phrase relative to a 
5 variable data sudi as time of day as iriWorking'example 

1 or 2. or can obtain the finiai recognitiori data using 
some other method. For example, if the final recognition 
data is obtained as in Working example 1 or 2 and the 
response content for this final recognition data is proc- 

10 essed as explairied in Working exarhples 3 through 6, 
the speaker's phrases can be- successfully recognised 
at'high rates/and the respohse to'the speaker's phrases 
can nnatch the prevailiiig condition much ^4tter. Addi- 
tionally, by using all of the response content processes • 
IS exjplairied in Working exarnples 3 through 6 or in some 
^ dDmbinatidhs, the response can nriatch the -prevailing 
condition much better. For exanr^le. if Working example 

2 is combined with Working exarhpte 3. and the temper- 
atUr^ serisbr, the air pressure sensor, and the' calendar 

20 area explained ' in ' Working examfales '4 through 6 are 
added; Accurate sjpeech recognition can be perfornried * 
that takes into corisicierat ion appropriateness of the con- 
tent of the speal^er's phrase relative to time of diay and 
it is-^possible' to enjoy" changes in the level of the re- 

25 sponse coriteht f rcxn the stuffed toy as time passes Fur- 
therrhbrei trlferactk)ns thatlake into account infbrmatkjn 
such as temperature, weather, and date become possi- 
ble, 'and thus an extremely sbphisticated iriteractive 
spieech recognition device can be realised. - 

30 Thus; the interactivis speech recognition device of 

' the' preser^t invention generates a weighting coefficient 
that chaiiges as variable data changes by matching the 
content of each recognition target speech, and outputs 
recognition data from a speech recognition means by 

35 taking this' ^Weighting coefficient into consideration. 

~ ' Therefor^, even if the recbgnitibn target speeches con- 
tain ^speech data patterns 'possessing sinrtilar input- 
speech pkttems; said weighting coefficient can assign 
higher priority to the recognition data of the input speech 

40 than to bthei- catalogued redbgnitioh' data. As a result, 
greetirig phrases related to tirhe of day. weather, date, 
^c. aire recbgr>ised by considering the prevailing condi- 
tk)n, thbs significantly improving the recognition rates. 
Furthermore when'time data is used as the variable 

45 data, a weighting cbefficteht that^changes as time data 
' changes' is- generated by matching the content of each 
recognition tafrgel speech, and recognition data is output* 
from a speech recognition means by taking- this weight- 
ing coefficient into consideration. Therefore, the recog- 

50 nrtion -rates' can be significantly improved for time -relat- 
* ed greeting phraseis such as "Good morning" arid "Good 
night" that are usefdirequently. ' >^ • r 

Additibhally, when time data is used as the variable 
data, the time at which airp input speech is correctly rec- 

55 ognised by said speech recognition meahs is taken from 
said clock mearhs so thatUhe weighting coefficient for 
said speech is changed according to' the time data for 
the correct recognition: Tlius input speeches are recog- 
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nised based on the recognition data calculated by taking 
the weighting coefficient into consideration. Therefore, 
the recognition rates can be significantly improved for 
tinne-related greeting phrases such as "Good morning" 
and "Goodnight" that are used frequently. Furthermore, 
the time at which input speech is correctly recognised 
is always detected, and a weighting coefficient is deter- 
mined based on the past recognition items of said 
speech, and thus it becomes possible to set weighting 
coefficients that match the actual usage conditions. 

Time data and/or the recognition count data for cor- 
rect recognition by said speech recognition means are 
input, the response content level for changing the re- 
sponse content for the input speech is generated based 
on the input data, and a response content that matches 
this response level is output. Therefore, the response 
content level can be changed in stages in response to 
the phrase issued by the speaker. For example, when 
the invention is used in a stuffed toy animal, the increas- 
ing response content level gives the illusion that the toy 
animal is growing and changing its response as it grows. 
The toy can then be made to act like a living creature by 
making it respond differently as time passes even when 
the same phrase "Good morning" for example is recog- 
nised. Furthermore the invention provides an excellent 
effect in that the toy is not boring because it responds 
with different phrases even when the speaker says the 
same thing. Additionally, the invention provides the ef- 
fect of enabling smooth interaction because the speak- 
er's phrases will necessarily be recognised at higher 
rates when the response content level value increases, 
if the speaker learns to speak to the toy in recognisable 
ways when the toy's response content level value is still 
low. 

Additionally, variable data which detects variable 
data that affects the response content is detected, and 
response content that takes this variable data into con- 
sideration is output. Therefore, sophisticated interac- 
tions become possible that correspond to various situ- 
ational changes. 

The temperature of the surrounding environment 
may be measured as said variable data, and the re- 
sponse content is output based on this temperature da- 
ta. Therefore, sophisticated interactions become possi- 
ble that tailor the response to the prevailing tempera- 
ture. 

The air pressure of the surrounding environment 
may be measured as said variable data, and the re- 
sponse content is output based on this air pressure data. 
Therefore, sophisticated interactions become possible 
that tailor the response to the prevailing weather condi- 
tion. 

Finally, calendar data may be used as said variable 
data, and the response content is output based on this 
calendar data. Therefore, sophisticated interactions be- 
come possible that tailor the response to the calendar 



Claims . ' ' 

1 . Ah iht^rSctive speeii^h recognition device for recog- 
nising 'diidr^sjfeHding to input speech comprising: 

a speech analysing means (2) for analysing in^ 
put ispeech by comparing it to pre-registered 
speech patterns and for creating a speech data 
pattern; / * ' ' 

a speech recbgnition nhearis (3) for recogriising 
the input speech by analysinig the spi^ech data 
pattem and deriving recognition data; ^ 
a speech output hneahs (6-8) for outptitting a 
response to said input speech using said rec- 
^ ognition data; and c^^ " ' 

^ a coefficient setting nneahs (4) for generating 
' ; w^ight^d cbeffidients tor each of the pre-regis- 
tered spleech paitterris ' and providing said 
speech recognition means with ' said coeffi- 
cients thereby enabling the recognition data ac- 
curacy to be improved. 

2. An interactive speech recognition device as 
claimed in claim 1 , further comprising a variable da- 
ta detection means. 

3. An interactive speech recognition device according 
to Claim 2, in which said variable data detection 
means comprises a timing means for detecting time 
data, and in that said coefficient setting means gen- 
erates a weighting coefficient that corresponds to 
the time of day for each of the pre-registered speech 
patterns. 

4. An interactive speech recognition device according 
to Claim 3, in which the time at which an input 
speech is correctly recognized by said speech rec- 
ognition means is fetched from said timing means 
the weighting coefficient for the recognition data is 
given the largest value if the input speech occurs at 
a time at which it was correctly recognized most fre- 
quently in the past, and a smaller weighting coeffi- 
cient is given as the time deviates from this peak 
time, based on the time data for the conrect recog- 
nition. 
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5. An interactive speech recognitbn device according 
to Claim 4 in which said variable data detection 
means comprises a temperature sensor that meas- 

^ ures the temperature of the usage environment and 
outputs the temperature data, and said response 
content creation means outputs the response con- 
tent data by taking said temperature data into con- 
sideration. 

55 

6. An interactive speech recognition device according 
to Claim 4 in which said variable data detection 
means comprises an air pressure sensorthat meas- 
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ures the air pressure of the usage environment and 
outputs the air pressure data, and said response 
content creation means ogtputs the response con- 
lent data, by takin g said ai r press u re data, into con- 
sideration. ^ 

7. An interactive speech recognition device according 
to Claim 4 in which.said variable data detection 
means comprises a calendar detection nrieans that 
detects calendar data and putputs the calendar da- 
ta, and said response content creation rrieans out- 
puts the response content data by taking said cal- 
endar data into consideration. 

8. An ihteractive\ speech recognition device as is 
claimed in any one of Qlainis 5 to 7. in which said 
variable data detection means, comprises two or 
more of a temperature sensor, air pressure sensor 

of calendar detection means. . , , 
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(54) An interactive speech recognition device 

(57) The present invention relates to an interactive 
speech recognition device that recognises speech and 
produces sounds or actions in response to the recogni- 
tion result. 

The device includes a microphone (1), a speech 
analysis area (2). a recognition area (3), a coefficient 



setting means (4) and output means (6.7,8; 11-16). The 
coefficient setting means (4) enables the ainway of the 
output to be improved. Additional features include a 
temperature sensor, air pressure sensor calendar 
means to improve the airway further and to enable the 
output to be adaptive. 




EP.O 730 261 A3 




European Patent 
Ofnce 



EUROPEAN SEARCH REPORT 



Application Number 

EP 95 3Q 1394 



DOCUMENTS CONSIDERED TO BE RELEVANT 



Cutesory 



Citation of document^ with indknitioii, where apprt^iriate, 
of relevant passages 



Relevant 
to daim 



CLASSinCATlON OF THE 
APPUCATION aouCt6> 



US 5 029 214 A (HOLLANDER JAMES F) 2 July 
1991 

* abstract * ' ' : • 

* column 9. line 6 - line 13 * ' ' 

* claim 116; tables III, IV * ^ 

US 4 923 428 A (CURRAN KENNETH J) 8 May 
1990 ; 

* abstract * * 

WO 93 06575 A (SEDLMAYR STEVEN R) 1 April 
1993 : 

* abstract * I 

\ 

WO 87 06487 A (SI ROTA VLADIMIR) 5 Noveirt)er 
1987 ; 

* abstract; claims 1,6,7 * ; 



1,2 



Tbr prmcnt ifcarcfa report has been drawn up for all datms 



THE HAGUE 



Dtit* of mnptellM ef ikc teank 

2 June 1997 



G10L3/O0 



TECHNICAl, FIELDS 
SEARCHED (lai.n.6) 



GIOL 
A63H 



Van Doremalen, J 



( Alfr CORY OF a TED UOCUMENTS 

X : pvtioilariy rclnint if tmken akine 

V : pvtiniljfiy relevant if combinedivrith another 

4cknMa#fif oi tb« same cal«{^iy 
A, : terbmtlPKical background • • * 

O : noo writicn dtsclusure ; . ,:t . ' 

r . mt«nBcdut« docurocni ' ' 



1 : thenry or principle underlying the invetttion 
E : earlier patent documeat, but ptibtisbcd on, or 

sttcr the fifing dale 
1) : do'curoent dted in the appltcation 
L. : document dt«d fat other reasons , . . ; 

A : member of the same patent, family, correspontfing 
tfucunent 



BNSDOCID: <EP__ 0730a61A3 I > 



